Conformance Tests for SEP-2322 MRTR#188
Conversation
commit: |
|
Hello - saw this PR while looking at the 2322 finalizing threads. I've been porting our local MRTR + Tasks Extension scenarios into a fork of the official suite at panyam/mcpconformance:feat/tasks-mrtr-extension - looks like our ephemeral-flow scenarios cover similar ground to your A1-A7 set, and we've also built out the wider Tasks Extension perimeter (lifecycle, capability negotiation, dispatch, request-state, headers, notifications) which this PR doesn't span. The bridge scenario (Tasks + MRTR partial fulfillment) is narrow on our side - 3 checks - vs your incomplete-result-tasks.ts which goes deeper. Looks like the two are mostly complementary. If you're planning to revive this PR after the SEP finalizes, happy to help refresh wire format and pair on the bridge surface. Otherwise I can open a separate PR for the wider Tasks Extension scope and defer the ephemeral-flow / bridge depth to whatever lands here. Or some merged form, whatever's easiest for you. Just wanted to make sure I wasnt undoing anything 🙏 |
Mrtr tests 5 14 update
* add missing tests and update sep-2322.yaml
|
Hey @panyam would love some help here. I've updated this PR now that 2322 is Approved and checked in. I removed the tasks tests and marked the checks as excluded since we have moved Tasks to an extension. I saw you opened one for the Task extension which I think is the right path forward for the Task Conformance tests. Appreciate the help.
|
|
@CaitieM20 Thanks - and yep, totally agree with the split. PR 262 (the Task extension side) is now in a position to be merged. Luca has signed off and asked pcarleton to give it a final look. We're shipping it with two cross-SEP suites deliberately skipped (mrtr-tasks-composition and tasks-status-notifications via subscriptions/listen). Plans are there for a fast-follow for both. These both overlap naturally with the MRTR side here - the composition test in particular needs to encode the asymmetric requestState invariant (MRTR phase carries requestState, Task phase forbids it), which only lands cleanly once both PRs are in. So when this one merges I can bring on the composition harness against whatever fixture shape you settle on. Will review the refreshed diff here in the meantime and surface anything from the Task-side experience that's worth folding in - but overall thanks for all the updates. Looking great and cant wait for it! |
* remove duplicates in index, rename test cases to be consistent * add negative tests * refactor into everything-server and update negative tests * fix conformance tests
…quirement The traceability schema recognizes only check/text/url/issue/excluded on a requirement row. The 'note:' field on the scenario-gate rows was silently dropped, so those 11 rows would have been ingested as ordinary requirement rows whose text is not a spec sentence, inflating the SEP-2322 requirement count on the traceability dashboard. - Remove the 10 flow-gate rows (sep-2322-*-complete, sep-2322-multi-round-r*, sep-2322-non-tool-*) plus sep-2322-multiple-inputs-incomplete. The checks are unchanged and still emitted by the scenarios; their IDs now surface in the manifest's 'untracked' list, which is the designed home for scenario scaffolding that doesn't map to an RFC-2119 sentence. - Move 'inputRequests keys ... MUST be unique' to the excluded list: duplicate JSON object keys are collapsed by the parser before the harness can observe them, so the requirement is not testable at the protocol level. The check previously paired with it actually verifies that the server returns three inputRequests of different method types, which is a flow gate, not a key-uniqueness test.
The spec says the client MUST echo back the exact value of requestState and MUST NOT inspect, parse, or modify it. The check previously parsed the returned state as JSON and compared two fields, so a client that deserialized the state and re-serialized it (different key order, whitespace, extra fields) would still pass despite having modified the opaque value. Store the exact string the mock server sent and compare the echoed value with strict string equality instead. Also include the sent value in the check details so a mismatch is diagnosable from the report.
pcarleton
left a comment
There was a problem hiding this comment.
LGTM!
Left 2 small tweaks in follow-up commits, i'll optimistically merge, but lmk if you disagree with either
* chore: refresh SEP traceability manifest (typescript-sdk@main) Regenerated from a client+server suite run against typescript-sdk@5fc42e9be115 following the recipe in .github/workflows/traceability.yml. New entries since the last refresh (typescript-sdk@22595b96): - SEP-2322 (MRTR, #188): 17 tested, 0 untested, 16 excluded, 3 untracked - SEP-2549 (TTL for list results, #275): 7 tested, 0 untested, 13 excluded - SEP-2260: 12 excluded rows, no checks - SEP-2207: yaml rows added since the last refresh now appear (1 tested, 1 untested: sep-2207-server-no-offline-access) No previously-tested requirement regressed. * Exclude sep-2207 server offline_access guidance until RS auth scenarios exist sep-2207-server-no-offline-access was declared in the yaml but no scenario emits it, so it surfaced as the only untested requirement in the refreshed manifest. The check needs to probe the SDK server's Protected Resource Metadata scopes_supported and WWW-Authenticate challenge scope, and the server suite does not yet exercise the SDK server as an OAuth protected resource at all. Mark the requirement excluded with a pointer to #116 (server-side authorization baseline) rather than leaving it as a permanently-untested row; revisit when server-side authorization scenarios land.
Draft Conformance tests for the SEP-2322: Multi Round-Trip Requests
Also added code to client-helper.ts to make rawMCP Requests (i.e. basic json requests) this will be generally useful for draft features that may not have reference implementations yet.
Motivation and Context
See SEP
How Has This Been Tested?
Conformance Tests & Reference Implementation in progress work
Breaking Changes
yes see SEP
Types of changes
Checklist
Additional context